The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to optimize the performance and complexity of ViTs to enable efficient deployment on mobile devices. Multiple approaches are proposed to accelerate attention mechanism, improve inefficient designs, or incorporate mobile-friendly lightweight convolutions to form hybrid architectures. However, ViT and its variants still have higher latency or considerably more parameters than lightweight CNNs, even true for the years-old MobileNet. In practice, latency and size are both crucial for efficient deployment on resource-constraint hardware. In this work, we investigate a central question, can transformer models run as fast as MobileNet and maintain a similar size? We revisit the design choices of ViTs and propose an improved supernet with low latency and high parameter efficiency. We further introduce a fine-grained joint search strategy that can find efficient architectures by optimizing latency and number of parameters simultaneously. The proposed models, EfficientFormerV2, achieve about $4\%$ higher top-1 accuracy than MobileNetV2 and MobileNetV2$\times1.4$ on ImageNet-1K with similar latency and parameters. We demonstrate that properly designed and optimized vision transformers can achieve high performance with MobileNet-level size and speed.
translated by 谷歌翻译
Surgery is the only viable treatment for cataract patients with visual acuity (VA) impairment. Clinically, to assess the necessity of cataract surgery, accurately predicting postoperative VA before surgery by analyzing multi-view optical coherence tomography (OCT) images is crucially needed. Unfortunately, due to complicated fundus conditions, determining postoperative VA remains difficult for medical experts. Deep learning methods for this problem were developed in recent years. Although effective, these methods still face several issues, such as not efficiently exploring potential relations between multi-view OCT images, neglecting the key role of clinical prior knowledge (e.g., preoperative VA value), and using only regression-based metrics which are lacking reference. In this paper, we propose a novel Cross-token Transformer Network (CTT-Net) for postoperative VA prediction by analyzing both the multi-view OCT images and preoperative VA. To effectively fuse multi-view features of OCT images, we develop cross-token attention that could restrict redundant/unnecessary attention flow. Further, we utilize the preoperative VA value to provide more information for postoperative VA prediction and facilitate fusion between views. Moreover, we design an auxiliary classification loss to improve model performance and assess VA recovery more sufficiently, avoiding the limitation by only using the regression metrics. To evaluate CTT-Net, we build a multi-view OCT image dataset collected from our collaborative hospital. A set of extensive experiments validate the effectiveness of our model compared to existing methods in various metrics. Code is available at: https://github.com/wjh892521292/Cataract OCT.
translated by 谷歌翻译
发现深度学习模型很容易受到对抗性示例的影响,因为在深度学习模型的输入中,对扰动的扰动可能引起错误的预测。对抗图像生成的大多数现有作品都试图为大多数模型实现攻击,而其中很少有人努力确保对抗性示例的感知质量。高质量的对手示例对许多应用很重要,尤其是保留隐私。在这项工作中,我们基于最小明显差异(MND)概念开发了一个框架,以生成对对抗性隐私的保留图像,这些图像与干净的图像具有最小的感知差异,但能够攻击深度学习模型。为了实现这一目标,首先提出了对抗性损失,以使深度学习模型成功地被对抗性图像攻击。然后,通过考虑摄动和扰动引起的结构和梯度变化的大小来开发感知质量的损失,该损失旨在为对抗性图像生成保持高知觉质量。据我们所知,这是基于MND概念以保存隐私的概念来探索质量保护的对抗图像生成的第一项工作。为了评估其在感知质量方面的性能,在这项工作中,通过建议的方法和几种锚方法测试了有关图像分类和面部识别的深层模型。广泛的实验结果表明,所提出的MND框架能够生成具有明显改善的性能指标(例如PSNR,SSIM和MOS)的对抗图像,而不是用锚定方法生成的对抗性图像。
translated by 谷歌翻译
作为一个严重的问题,近年来已经广泛研究了单图超分辨率(SISR)。 SISR的主要任务是恢复由退化程序引起的信息损失。根据Nyquist抽样理论,降解会导致混叠效应,并使低分辨率(LR)图像的正确纹理很难恢复。实际上,自然图像中相邻斑块之间存在相关性和自相似性。本文考虑了自相似性,并提出了一个分层图像超分辨率网络(HSRNET)来抑制混叠的影响。我们从优化的角度考虑SISR问题,并根据半季节分裂(HQS)方法提出了迭代解决方案模式。为了先验探索本地图像的质地,我们设计了一个分层探索块(HEB)并进行性增加了接受场。此外,设计多级空间注意力(MSA)是为了获得相邻特征的关系并增强了高频信息,这是视觉体验的关键作用。实验结果表明,与其他作品相比,HSRNET实现了更好的定量和视觉性能,并更有效地释放了别名。
translated by 谷歌翻译
视觉变压器(VIT)显示了计算机视觉任务的快速进步,在各种基准上取得了令人鼓舞的结果。但是,由于参数和模型设计的数量大量,例如注意机制,基于VIT的模型通常比轻型卷积网络慢。因此,为实时应用程序部署VIT特别具有挑战性,尤其是在资源受限的硬件(例如移动设备)上。最近的努力试图通过网络体系结构搜索或与Mobilenet块的混合设计来降低VIT的计算复杂性,但推理速度仍然不令人满意。这导致了一个重要的问题:变形金刚在获得高性能的同时可以像Mobilenet一样快吗?为了回答这一点,我们首先重新审视基于VIT的模型中使用的网络体系结构和运营商,并确定效率低下的设计。然后,我们引入了一个尺寸一致的纯变压器(无需Mobilenet块)作为设计范式。最后,我们执行以延迟驱动的缩小,以获取一系列称为EfficityFormer的最终模型。广泛的实验表明,在移动设备上的性能和速度方面,有效形式的优势。我们最快的型号,EfficientFormer-L1,在ImagEnet-1k上获得$ 79.2 \%$ $ TOP-1的准确性,仅$ 1.6 $ MS推理潜伏期在iPhone 12上(与Coreml一起编译),该{运行速度与MobileNetV2 $ \ Times Times 1.4 $( $ 1.6 $ MS,$ 74.7 \%$ top-1),我们最大的型号EfficientFormer-L7,获得了$ 83.3 \%$精度,仅$ 7.0 $ MS延迟。我们的工作证明,正确设计的变压器可以在移动设备上达到极低的延迟,同时保持高性能。
translated by 谷歌翻译
自然语言视频本地化(NLVL)是视觉语言理解区域的重要任务,该方面还要求深入了解单独的计算机视觉和自然语言侧,但更重要的是两侧之间的相互作用。对抗性脆弱性得到了很好的认可,作为深度神经网络模型的关键安全问题,需要谨慎调查。尽管在视频和语言任务中进行了广泛但分开的研究,但目前对NLVL等愿景联合任务的对抗鲁棒性的理解较少。因此,本文旨在通过检查攻击和防御方面的三个脆弱性,全面调查NLVL模型的对抗性鲁棒性。为了实现攻击目标,我们提出了一种新的对抗攻击范式,称为同义句子感知对抗对抗攻击对逆向(潜行),这捕获了视觉和语言侧面之间的跨模式相互作用。
translated by 谷歌翻译
在线广告中,自动竞标已成为广告商通过简单地表达高级活动目标和约束来优化其首选广告性能指标的重要工具。以前的作品从单个代理的视图中设计了自动竞争工具,而不会在代理之间建模相互影响。在本文中,我们从分布式多功能代理人的角度来看,请考虑这个问题,并提出一个常规$ \强调{m} $ ulti - $ \强调{a} $ gent加强学习框架,以便为$ clown {a} $ uto - $ \ Underline {b} $ IDDIND,即MAAB,了解自动竞标策略。首先,我们调查自动招标代理商之间的竞争与合作关系,并提出了一个温度定期的信用分配,以建立混合合作竞争范式。通过在代理商中仔细开展竞争和合作权衡,我们可以达到均衡状态,不仅担保个人广告商的实用程序,而且保证了系统性能(即社会福利)。其次,为避免竞争低价潜在勾结行为的合作,我们进一步提交了律师代理,为每位专家设定个性化招标酒吧,然后减轻由于合作而导致的收入退化。第三,要在大型广告系统中部署MAAB,我们提出了一种平均现场方法。通过将具有与平均自动竞标代理商相同的广告商进行分组,大规模广告商之间的互动大大简化,使得培训MAAB有效地培训。在离线工业数据集和阿里巴巴广告平台上进行了广泛的实验表明,我们的方法在社会福利和收入方面优于几种基线方法。
translated by 谷歌翻译
在过去的十年中,由于航空图像引起的物体的规模和取向的巨大变化,对象检测已经实现了自然图像中的显着进展,而不是在空中图像中。更重要的是,缺乏大规模基准已成为在航拍图像(ODAI)中对物体检测发展的主要障碍。在本文中,我们在航空图像(DotA)中的物体检测和用于ODAI的综合基线的大规模数据集。所提出的DOTA数据集包含1,793,658个对象实例,18个类别的面向边界盒注释从11,268个航拍图像中收集。基于该大规模和注释的数据集,我们构建了具有超过70个配置的10个最先进算法的基线,其中已经评估了每个模型的速度和精度性能。此外,我们为ODAI提供了一个代码库,并建立一个评估不同算法的网站。以前在Dota上运行的挑战吸引了全球1300多队。我们认为,扩大的大型DOTA数据集,广泛的基线,代码库和挑战可以促进鲁棒算法的设计和对空中图像对象检测问题的可再现研究。
translated by 谷歌翻译
为了解决复杂环境中的自主导航问题,本文新呈现了一种有效的运动规划方法。考虑到大规模,部分未知的复杂环境的挑战,精心设计了三层运动规划框架,包括全局路径规划,本地路径优化和时间最佳速度规划。与现有方法相比,这项工作的新颖性是双重的:1)提出了一种新的动作原语的启发式引导剪枝策略,并完全集成到基于国家格子的全球路径规划器中,以进一步提高图表搜索的计算效率,以及2)提出了一种新的软限制局部路径优化方法,其中充分利用底层优化问题的稀疏带系统结构以有效解决问题。我们在各种复杂的模拟场景中验证了我们方法的安全,平滑,灵活性和效率,并挑战真实世界的任务。结果表明,与最近的近期B型zier曲线的状态空间采样方法相比,全球规划阶段,计算效率提高了66.21%,而机器人的运动效率提高了22.87%。我们命名拟议的运动计划框架E $ \ mathrm {^ 3} $拖把,其中3号不仅意味着我们的方法是三层框架,而且还意味着所提出的方法是三个阶段有效。
translated by 谷歌翻译